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Abstract. Although the notion of a concept as a collection of objects 
^ I sharing certain properties, and the notion of a conceptual hierarchy are 

fundamental to both Formal Concept Analysis and Description Logics, 
^^ ' the ways concepts are described and obtained differ significantly between 

these two research areas. Despite these differences, there have been sev- 
eral attempts to bridge the gap between these two formalisms, and at- 
tempts to apply methods from one field in the other. The present work 
aims to give an overview on the research done in combining Description 



o. „„„„_ 

I— I ' Logics and Formal Concept Analysis. 
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1 Introduction 



^ I Formal Concept Analysis (FCA) [SB] is a field of applied mathematics that aims 

^^ ' to formalize the notions of a concept and a conceptual hierarchy by means of 

Q>-, I mathematical tools. On the other hand Description Logics (DLs) [3] are a class of 

^s^ ■ logic-based knowledge representation formalisms that are used to represent the 

conceptual knowledge of an application domain in a structured way. Although 
(^ • the notion of a concept as a collection of objects sharing certain properties, and 

the notion of a conceptual hierarchy are fundamental to both FCA and DLs, 
the ways concepts are described and obtained differ significantly between these 
two research areas. In DLs, the relevant concepts of the application domain are 
. ■ formalized by so-called concept descriptions, which are expressions built from 

rS I unary predicates (that are called atomic concepts), and binary predicates (that 

j^ ■ are called atomic roles) with the help of the concept constructors provided by 

the DL language. Then in a second step, these concept descriptions are used 
to describe properties of individuals occurring in the domain, and the roles are 
used to describe relations between these individuals. On the other hand, in FCA, 
one starts with a so-called formal context, which in its simplest form is a way of 
specifying which attributes are satisfied by which objects. A formal concept of 
such a context is a pair consisting of a set of objects called extent, and a set of 
attributes called intent such that the intent consists of exactly those attributes 
that the objects in the extent have in common, and the extent consists of exactly 
those objects that share all attributes in the intent. 

There are several differences between these approaches. First, in FCA one 
starts with a purely extensional description of the application domain, and then 



derives the formal concepts of this specific domain, which provide a useful struc- 
turing. In a way, in FCA the intensional knowledge is obtained from the exten- 
sional part of the knowledge. On the other hand, in DLs the intensional definition 
of a concept is given independently of a specific domain (interpretation) , and the 
description of the individuals is only partial. Second, in FCA the properties are 
atomic, and the intensional description of a formal concept (by its intent) is 
just a conjunction of such properties. DLs usually provide a richer language for 
the intensional definition of concepts, which can be seen as an expressive, yet 
decidable sublanguage of first-order predicate logic. 

Despite these differences, there have been several attempts to bridge the gap 
between these two formalisms, and attempts to apply methods from one field to 
the other. For example, there have been efforts to enrich FCA with more complex 
properties similar to concept constructors in DLs [60145144123146) . On the other 
hand, DL research has benefited from FCA methods to solve some problems en- 
countered in knowledge representation using DLs |1I55I10I12I48I14I49I52I16I7I53I50I4I5TT3] . 
The present work aims to give an overview on these works done for bridging the 
gap between the two formalisms. In Section [2] we give a short introduction to 
DLs without going into technical details. We assume that the reader is familiar 
with FCA. We do not introduce FCA, we refer the reader to [5S] for details. In 
Section [3] we summarize the existing work done by other researchers in the field. 
In Section|4]we summarize our own contributions to the field, and conclude with 
Section [S] 



2 Description Logics 

Description Logics (DLs) [3] are a class of knowledge representation formalisms 
that are used to represent the terminological knowledge of an application domain 
in a structured way. Since their introduction, DLs have been used in various ap- 
plication domains such as medical informatics, software engineering, configura- 
tion of technical systems, natural language processing, databases and web-based 
information systems. But their most notable success so far is the adoption of 
the DL-based language OWl|^f34^ as the standard ontology language for the 
semantic web [TTJ. 



SyntELx In DLs, one formalizes the relevant notions of an application domain by 
concept descriptions. A concept description is an expression built from atomic 
concepts, which are unary predicates, and atomic roles, which are binary predi- 
cates, by using the concept constructors provided by the particular DL language 
in use. DL languages are identified with the concept constructors they allow. For 
instance the smallest propositionally closed language allowing for the construc- 
tors n (conjunction), U (disjunction), -i (negation), V (value restriction) and 3 
(existential restriction) is called ACC. 



^ Web Ontology Language. See |http: //www. w3.org/TR/owl-features] 



Typically, a DL knowledge base consists of a terminological box (TBox), 
which defines the terminology of an application domain, and an assertional box 
(ABox), which contains facts about a specific world. In its simplest form, a 
TBox is a set of concept definitions of the form A = C that assigns the concept 
name A to the concept description C. We call a finite set of general concept 
inclusion (GCI) axioms a general TBox. A GCI is an expression of the form 
C 'O D, where C and D are two possibly complex concept descriptions. It states 
a subconcept/superconcept relationship between the two concept descriptions. 
An ABox is a set of concept assertions of the form C{a), which means that the 
individual a is an instance of the concept C , and role assertions of the form 
R{a, b), which means that the individual a is in i?- relation with individual b. 

For instance the following TBox contains the definition of a landlocked coun- 
try, which is a country that only has borders on land, and the definition of an 
ocean country that has a border to an ocean. 

T := {LandlockedCountry = Country fl VhasBorderTo.Land 

OceanCountry = Country fl 3hasBorderTo. Ocean} 

The following ABox states the facts about the individuals Portugal, Austria, 
and Atlantic Ocean. 

A :~ {LandlockedCountry(Austria), Country(Portw(7a^), Ocean(ylf^antic Ocean), 
hasBorderTo{P ortug al , Atlantic Ocean)} 

Semantics The meaning of DL concepts is given by means of an interpretation 
I, which is a tuple consisting of a domain A^ and an interpretation function 
■^ . The interpretation function maps every concept occurring in the TBox to 
a subset of the domain, every role to a binary relation on the domain, and 
every individual name occurring in the ABox to an element of the domain. 
The meaning of complex concept descriptions is given inductively based on the 
constructors used in the concept description. 

For instance, the concept description Country Fl BhasBorderTo.Ocean is inter- 
preted as the intersection of the set of countries and the set of elements of the 
domain that have a border to an ocean. We say that an interpretation I is a 
model of a TBox 7" if it satisfies all concept definitions in T, i.e., for every con- 
cept definition ^ = C in 7", it maps A and C to the same subset of the domain. 
Similarly, we say that I is a model of an ABox A, if it satisfies all concept and 
role assertions in A, i.e., for every concept assertion A(a) in A, the interpretation 
of a is an element of the interpretation of A, and for every role assertion r(a, b) 
the interpretation of r contains the pair consisting of the interpretations of a 
and b. The semantics of DL ABoxes is the open-world semantics, i.e., absence of 
information about an individual is not interpreted as negative information, but 
it only indicates lack of knowledge about that individual. 

Inferences In an application, once we get a description of the application do- 
main using DLs as described above, we can make inferences, i.e., deduce implicit 



consequences from the explicitly represented knowledge. The basic inference on 
concept descriptions is subsumption. Given two concept descriptions C and D, 
the subsumption problem C C I? is the problem of checking whether the concept 
description D is more general than the concept description C. In other words, 
it is the problem of determining whether the first concept always, i.e., in every 
interpretation denotes a subset of the set denoted by the second one. We say 
that C is subsumed by D w.r.t. a TBox T, if in every model of 7", D is more 
general than C, i.e., the interpretation of C is a subset of the interpretation of 
D. We denote this as C Qj- D. For instance, in the example above, the concepts 
LandlockedCountry and OceanCountry are both trivially subsumed by the concept 
Country. 

The typical inference problem for ABoxes is instance checking, which is 
the problem of deciding whether the interpretation of a given individual is an 
element of the interpretation of a given concept in every common model of 
the TBox and the ABox. For instance, from T and A given above it follows 
that Portugal is an ocean country, although A does not contain the assertion 
OceanCountry(Porta5aZ). Modern DL systems like FaCT++ 57 , Racer ^\, 
Pellet [53, KA0N2 [ID], Hermit gj and GEL [Qj provide their users with in- 
ference services that solve these inference problems, which are also known as 
standard inferences. 



3 Existing work on DLs and FCA 

The existing work done by other researchers towards bridging the gap between 
FGA und DLs, and attempts to apply methods from one field to the other can 
roughly be collected under two categories: 

— efforts to enrich the language of FGA by borrowing constructors from DL 
languages |60l45l44l23l46| 

— efforts to employ FGA methods in the solution of problems encountered in 
knowledge representation with DLs 111551101481491501412112015] 

Below we are going to discuss some of these efforts briefly. 



3.1 Enriching FCA with DL constructors 

Theory-driven logical scaling In ^Si , Prediger and Stumme have used DLs in 
Conceptual Information Systems, which are data analysis tools based on FCA. 
They can be used to extract data from a relational database and to store it 
in a formal context by using so-called conceptual scales. Prediger and Stumme 
have combined DLs with attribute exploration in order to define a new kind of 
conceptual scale. In this approach, DLs provide a rich language to specify which 
FGA attributes cannot occur together, and a DL reasoner is used during the 
attribute exploration process as an expert to answer the implication questions, 
and to provide a counterexample whenever the implication does not hold. 



Terminological attribute logic In [33], Prediger has worked on introducing 
logical constructors into FCA. She has enriched FCA with relations, existen- 
tial and universal quantifiers, and negation, obtaining a language like the DL 
ACC^ which she has called terminologische Merkmalslogik {terminological at- 
tribute logic^. In the same work she has also presented applications of her ap- 
proach in enriching formal contexts with new knowledge, applications in many 
valued formal contexts, and applications for so-called scales^ which are formal 
contexts that are used to obtain a standard formal context from a many valued 
formal context. 



Relational concept analysis In [46], Rouane et al. have presented a com- 
bination of FCA and DLs that is called relational concept analysis. It is an 
adaptation of FCA that is intended for analyzing objects described by relational 
attributes in data mining. The approach is based on a collection of formal con- 
texts called relational context family and relations between these contexts. The 
relations between the contexts are binary relations between pairs of object sets 
that belong to two different contexts. Processing these contexts and relations 
with relational concept analysis methods yields a set of concept lattices (one 
for each input context) such that the formal concepts in different lattices are 
linked by relational attributes, which are similar to roles in DLs, or associations 
in UML. One distinguishing feature of this approach from the other efforts that 
introduce relations into FCA is that the formal concepts and relations between 
formal concepts of different contexts can be mapped into concept descriptions in 
a sublanguage of ACE, which is called FC~ £ in [46]. J-C~ £ allows for conjunc- 
tion, value restriction, existential restriction, and top and bottom concepts. In 
this approach, after the formal concepts and relations have been obtained and 
mapped into J-C^£ concept descriptions, DL reasoning is used to classify and 
check the consistency of these descriptions. 



3.2 Applying FCA methods in DLs 

Subsumption hierarchy of conjunctions of DL concepts In T, Baader 
has used FCA for an efficient computation of an extended subsumption hierarchy 
of a set of DL concepts. More precisely, he used attribute exploration for comput- 
ing the subsumption hierarchy of all conjunctions of a set of DL concepts. The 
main motivation for this work was to determine the interaction between defined 
concepts, which might not easily be seen by just looking at the subsumption 
hierarchy of defined concepts. In order to explain this, the following example 
has been given: assume that the defined concept NoDaughter stands for those 
people who have no daughters, NoSon stands for those people who have no sons, 
and NoSmallChild stands for those people who have no small children. Obviously, 
there is no subsumption relationship between these three concepts. On the other 
hand, the conjunction NoDaughter Fl NoSon is subsumed by NoSmallChild, i.e.. 



This translation is ours. 



if an individual a belongs to NoSon and NoDaughter, it also belongs to NoS- 
mallChild. However, this cannot be derived from the information that a belongs 
to NoSon and NoDaughter by just looking at the subsumption hierarchy. This 
small example demonstrates that runtime inferences concerning individuals can 
be made faster by precomputing the subsumption hierarchy not only for defined 
concepts, but also for all conjunctions of defined concepts. 

To this purpose, Baader defined a formal context whose attributes were the 
defined DL concepts, and whose objects were all possible counterexamples to 
subsumption relationships, i.e., interpretations together with an element of the 
interpretation domain. This formal context has the property that its concept 
lattice is isomorphic to the required subsumption hierarchy, namely the sub- 
sumption hierarchy of conjunctions of the defined DL concepts. However, this 
formal context has the disadvantage that a standard subsumption algorithm can 
not be used as expert for this context within attribute exploration. In order to 
overcome this problem, the approach was reconsidered in |12| and a new for- 
mal context that has the same properties but for which a usual subsumption 
algorithm could be used as expert was introduced. 

Subsumption hierarchy of conjunctions and disjunctions of DL con- 
cepts In [55 , Stumnie has extended the abovementioned subsumption hierarchy 
further with disjunctions of DL concepts. More precisely, he presented how the 
complete lattice of all possible combinations of conjunctions and disjunctions 
of the concepts in a DL TBox can be computed by using FCA. To this aim, 
he used another knowledge acquisition tool of FCA instead of attribute explo- 
ration, namely distributive concept exploration '56'. In the lattice computed by 
this method, the supremum of two DL concepts in the lattice corresponds to the 
disjunction of these concepts. 

Subsumption hierarchy of least common subsumers In fTO^ Baader and 
Molitor have used FCA for supporting bottom-up construction of DL knowledge 
bases. In the bottom-up approach, the knowledge engineer does not directly de- 
fine the concepts of her application domain, but she gives typical examples of a 
concept, and the system comes up with a concept description for these examples. 
The process of computing such a concept description consists of first computing 
the most specific concepts that the given examples belong to, and then comput- 
ing the least common suhsumer of these concepts. Here the choice of examples 
is crucial for the quality of the resulting concept description. If the examples are 
too similar, the resulting concept description will be too specific; conversely, if 
they are too distinct, the resulting concept description will be too general. In 
order to overcome this, Baader and Molitor have used attribute exploration for 
computing the subsumption hierarchy of all least common subsumers of a given 
set of concepts. In this hierarchy one can easily see the position of the least con- 
cept description that the chosen examples belong to, and decide whether these 
examples are appropriate for obtaining the intended concept description. How- 
ever, there may be exponentially many least common subsumers, and depending 



on the DL in use, both the least common subsumer computation and subsump- 
tion test can be expensive operations. The use of attribute exploration provides 
us with complete information on how this hierarchy looks like without explicitly 
computing all least common subsumers and classifying them. 



Relational exploration In his Ph.D thesis [49], Rudolph has combined DLs 
and FCA for acquiring complete relational knowledge about an application do- 
main. In his approach, which he calls relational exploration, he uses DLs for 
defining FCA attributes, and FCA for refining DL knowledge bases. More pre- 
cisely, DLs makes use of the interactive knowledge acquisition method of FCA, 
and FCA benefits from DLs in terms of expressing relational knowledge. 

In |48I49| ■ Rudolph uses the DL TC£ for this purpose, which is the DL 
that allows for the constructors conjunction, existential restriction, and value 
restriction. In his previous work [l^, he uses the DL ££, which allows for the 
constructors conjunction and existential restriction. In both cases, he defines 
the semantics by means of a special pair of formal contexts called binary power 
context family, which are used for expressing relations in FCA. Binary power 
context families have also been used for giving semantics to conceptual graphs. 
In order to collect information about the formulae expressible in TC£, in [48149) 
he defines a formal context called FC£- context. The attributes of this formal 
context are J"£f -concept descriptions, and the objects are the elements of the 
domain over which these concept descriptions are interpreted. In this context, 
an object g is in relation with an attribute m if and only if g is in the interpre- 
tation of m. Thus, an implication holds in this formal context if and only if in 
the given model the concept description resulting from the conjunction of the 
attributes in the premise of the implication is subsumed by the concept descrip- 
tion formed from the conclusion. This is how implications in J^£f -contexts give 
rise to subsumption relationships between FC£ concept descriptions. 

In order to obtain complete knowledge about the subsumption relationships 
in the given model between arbitrary J-C£ concepts, Rudolph gives a multi-step 
exploration algorithm. In the first step of the algorithm, he starts with an IFC£- 
context whose attributes are the atomic concepts occurring in a knowledge base. 
In exploration step i + 1, he defines the set of attributes as the union of the set 
of attributes from the first step and the set of concept descriptions formed by 
universally quantifying all attributes of the context at step i w.r.t. all atomic 
roles, and the set of concept descriptions formed by existentially quantifying all 
concept intents of the context at step i w.r.t all atomic roles. Rudolph points out 
that, at an exploration step, there can be some concept descriptions in the at- 
tribute set that are equivalent, i.e., attributes that can be reduced. To this aim, 
he introduces a method that he calls empiric attribute reduction. In principle, 
it is possible to carry out infinitely many exploration steps, which means that 
the algorithm will not terminate. In order to guarantee termination, Rudolph re- 
stricts the number of exploration steps. After carrying out i steps of exploration, 
it is then possible to decide subsumption (w.r.t. the given model) between any 
TC£ concept descriptions up to role depth i just by using the implication bases 



obtained as a result of the exploration steps. In addition, he also characterizes 
the cases where finitely many steps are sufficient to acquire complete information 
for deciding subsumption between T HE concept descriptions with arbitrary role 
depth. Rudolph argues that his method can be used to support the knowledge 
engineers in designing, building and refining DL ontologies. This method has 
been implemented in the tool Relexoo 



Exploring Finite Models in the DL £Cgfp In j4j Baader and Distel have 
extended classical FCA in order to provide support for analyzing relational 
structures by using efficient FCA algorithms. In this approach the atomic at- 
tributes are replaced by complex formulae in some logical language, and data 
is represented using relational structures rather than just formal contexts. This 
extension is later instantiated with atrributes defined in the DL £C, and with 
relational structures defined over a signature of unary and binary predicates, i.e., 
models for £C. In this setting an implication corresponds to a GCI in £C. This 
approach at the first sight seems to be very close to the approach introduced 
in [48149] . One of the main differences between these approaches is that in [3] the 
authors use one context with infinitely many complex attributes, whereas in [15] 
Rudolph uses an infinite family of contexts, each having finitely many attributes 
that are obtained by restricting the role depth of concepts. In [4] the authors 
additionally show that for the DLs £C and SCgfp, which extends £C with cyclic 
concept definitions interpreted with greatest fixpoint semantics, the set of GCIs 
holding in a finite model always has a finite basis. That is, there is always a 
finite subset of the infinitely many GCIs from which the rest follows. Later in [S] 
the authors have shown how to compute this basis efficiently by using methods 
from FCA. In a follow-up paper i22j, Distel has described how this method can 
be modified to allow ABox individuals as counterexamples to GCIs. 

4 Contributions to combining DLs and FCA 

Our contribution to the DL research by means of FCA methods falls mainly 
under two topics: 1) supporting bottom-up construction of DL knowledge bases, 
2) completing DL knowledge bases. In Section HTT] we briefiy describe the use of 
FCA in the former, and in Section H?^ we briefly describe the use of FCA in the 
latter contribution. 

4.1 Supporting bottom- up construction of DL Ontologies 

Traditionally, DL knowledge bases are built in a top-down manner, in the sense 
that first the relevant notions of the domain are formalized by concept descrip- 
tions, and then these concept descriptions are used to specify properties of the 
individuals occurring in the domain. However, this top-down approach is not 



' |http : //relexo . ontoware . org| 



always adequate. On the one hand, it might not always be intuitive which no- 
tions of the domain are the relevant ones for a particular application. On the 
other hand, even if this is intuitive, it might not always be easy to come up 
with a clear formal description of these notions, especially for a domain expert 
who is not an expert in knowledge engineering. In order to overcome this, in [8] 
a new approach, called "bottom-up approach" , was introduced for constructing 
DL knowledge bases. In this approach, instead of directly defining a new con- 
cept, the domain expert introduces several typical examples as objects, which 
are then automatically generalized into a concept description by the system. 
This description is then offered to the domain expert as a possible candidate for 
a definition of the concept. The task of computing such a concept description 
can be split into two subtasks: 

— computing the most specific concepts of the given objects, 

— and then computing the least common subsumer of these concepts. 

The most specific concept (msc) of an object o is the most specific concept de- 
scription C expressible in the given DL language that has o as an instance. The 
least common subsumer (Ics) of concept descriptions Ci , . . . , C„ is the most spe- 
cific concept description C expressible in the given DL language that subsumes 
Ci, . . . , C„. The problem of computing the Ics and (to a more limited extent) 
the msc has already been investigated in the literature J8|39l2j . 

The methods for computing the least common subsumer are restricted to 
rather inexpressive descriptions logics not allowing for disjunction (and thus not 
allowing for full negation). In fact, for languages with disjunction, the Ics of a 
collection of concepts is just their disjunction, and nothing new can be learned 
from building it. In contrast, for languages without disjunction, the Ics extracts 
the "commonalities" of the given collection of concepts. Modern DL systems 
like FaCT-h-h |33I57| . Racer [IS], Pellet [51], and Hermit [H] are based on very 
expressive DLs, and there exist large knowledge bases that use this expressive 
power and can be processed by these systems. In order to allow the user to 
re-use concepts defined in such existing knowledge bases and still support the 
user in defining new concepts with the bottom-up approach sketched above, in 
115114116] we have proposed the following extended bottom-up approach: assume 
that there is a fixed background terminology defined in an expressive DL; e.g., a 
large ontology written by experts, which the user has bought from some ontology 
provider. The user then wants to extend this terminology in order to adapt it 
to the needs of a particular application domain. However, since the user is not 
a DL expert, he employs a less expressive DL and needs support through the 
bottom-up approach when building this user-specific extension of the background 
terminology. There are several reasons for the user to employ a restricted DL in 
this setting: first, such a restricted DL may be easier to comprehend and use for 
a non-expert; second, it may allow for a more intuitive graphical or frame-like 
user interface; third, to use the bottom-up approach, the Ics must exist and make 
sense, and it must be possible to compute it with reasonable effort. 

To make this more precise, consider a background terminology (TBox) T 
defined in an expressive DL L2. When defining new concepts, the user employs 



only a sublanguage Li of L2, for which computing the Ics makes sense. However, 
in addition to primitive concepts and roles, the concept descriptions written in 
the DL Li may also contain names of concepts defined in T. Let us call such con- 
cept descriptions Li(7~)-concept descriptions. Given Li(7~)-concept descriptions 
Ci, . . . , C„, we want to compute their Ics in Li(T), i.e., the least Li(T)-concept 
description that subsumes Ci, . . . , C„ w.r.t. T. In [14116) we have considered the 
case where Li is the DL ALE and L2 is the DL AZC^ and shown the following 
result: 

— If T is an acyclic ^£C-TBox, then the Ics w.r.t. T of ^£f (T)-concept de- 
scriptions always exists. 

Unfortunately, the proof of this result does not yield a practical algorithm. Due to 
this, in [14116153] we have developed a more practical approach. Assume that Li 
is a DL for which least common subsumers (without background TBox) always 
exist. Given Li(7')-concept descriptions Ci, . . . , C„, one can compute a common 
subsumer w.r.t. 7" by just ignoring T, i.e., by treating the defined names in 
Ci, . . . , Cn as primitive and computing the Ics of Ci, . . . , C„ in Li. However, the 
common subsumer obtained this way will usually be too general. In [14116153] . 
work we presented a method for computing "good" common subsumers w.r.t. 
background TBoxes, which may not be the least common subsumers, but which 
are better than the common subsumers computed by ignoring the TBox. In the 
present work we do not give the gcs algorithm in detail. We only demonstrate it 
on an example. The algorithm is described in detail in [16) . 

Example i. As a simple example, consider the ./l£C-TBox 7": 



NoSon = Vhas-ch 

NoDaughter = Vhas-ch 

SonRichDoctor = Vhas-ch 

DaughterHappyDoctor = Vhas-ch 

ChildrenDoctor = Vhas-ch 



Id. Female, 

Id.-iFemale, 

Id. (Female U (Doctor n Rich)), 

Id. (-.Female U (Doctor n Happy)), 

Id. Doctor, 



and the yl/If -concept descriptions 

3has-child.(NoSon n DaughterHappyDoctor), 



C 
D 



3has-child. (NoDaughter n SonRichDoctor) 



By ignoring the TBox, we obtain the ACE (T)- concept description 3has-child.T 
as a common subsumer of C, D. However, if we take into account that both 
NoSon n DaughterHappyDoctor and NoDaughter n SonRichDoctor are subsumed 
by the concept ChildrenDoctor, then we obtain the more specific common sub- 
sumer 3has-child. ChildrenDoctor. The gcs of C,D is even more specific. In fact, 
the least conjunction of (negated) concept names subsuming both NoSon Fl 
DaughterHappyDoctor and NoDaughter Fl SonRichDoctor is 



ChildrenDoctor Fl DaughterHappyDoctor n SonRichDoctor, 



and thus the gcs of C, D is 

Elhas-child.(ChildrenDoctor Fl DaughterHappyDoctor Fl SonRichDoctor). 

The conjunct ChildrenDoctor is actually redundant since it is implied by the 
remainder of the conjunction. o 

In order to implement the gcs algorithm, we must be able to compute the 
smallest conjunction of (negated) concept names that subsumes two such con- 
junctions Ci and C2 w.r.t. 7". In principle, one can compute this smallest con- 
junction by testing, for every (negated) concept name whether it subsumes both 
Ci and C2 w.r.t. T, and then take the conjunction of those (negated) concept 
names for which the test was positive. However, this results in a large number 
of (possibly expensive) calls to the subsumption algorithm for L2 w.r.t. (general 
or (a)cyclic) TBoxes. Since, in our application scenario (bottom-up construction 
of DL knowledge bases w.r.t. a given background terminology), the TBox T is 
assumed to be fixed, it makes sense to precompute this information. 

This is where FCA comes into play. By using the attribute exploration 
method [25 (possibly with background knowledge ( 25126127] ). we compute the 
abovementioned smallest conjunction, which is required for computing a gcs. To 
this purpose we define a formal context whose concept lattice is isomorphic to the 
subsumption hierarchy we are interested in. In general, the subsumption relation 
induces a partial order, and not a lattice structure on concepts. However, in the 
case of conjunctions of (negated) concept names, all infima exist, and thus also 
all suprema, i.e., this hierarchy is a complete lattice. The experimental results 
in [16] have shown that the use of this hierarchy and its use in gcs computation 
are indeed quite efficient. 

4.2 Completing DL Ontologies 

The standardization of OWL ^34j as the ontology language for the semantic 
web [13 led to the fact that several ontology editors like Protege [3^, and 
Swoop [37] now support OWL, and ontologies written in OWL are employed 
in more and more applications. As the size of these ontologies grows, tools that 
support improving their quality become more important. The tools available 
until now use DL reasoning to detect inconsistencies and to infer consequences, 
i.e., implicit knowledge that can be deduced from the explicitly represented 
knowledge. There are also promising approaches that allow to pinpoint the rea- 
sons for inconsistencies and for certain consequences, and that help the ontol- 
ogy engineer to resolve inconsistencies and to remove unwanted consequences 
[51 36 35 I32I11I43] . These approaches address the quality dimension of sound- 
ness of an ontology, both within itself (consistency) and w.r.t. the intended 
application domain (no unwanted consequences). In [6l7j we have considered a 
different quality dimension: completeness. We have provided a basis for formally 
well-founded techniques and tools that support the ontology engineer in checking 
whether an ontology contains all the relevant information about the application 
domain, and to extend the ontology appropriately if this is not the case. 



As already mentioned, a DL knowledge base (nowadays often called ontol- 
ogy) usually consists of two parts, the terminological part (TBox), which defines 
concepts and also states additional constraints (GCIs) on the interpretation of 
these concepts, and the assertional part (ABox), which describes individuals and 
their relationship to each other and to concepts. Given an application domain 
and a DL knowledge base describing it, we can ask whether the knowledge base 
contains all the relevant information about the domain: 

— Are all the relevant constraints that hold between concepts in the domain 
captured by the TBox? 

— Are all the relevant individuals existing in the domain represented in the 
ABox? 

As an example, consider the OWL ontology for human protein phosphatases 
that has been described and used in [59] . This ontology was developed based on 
information from peer-reviewed publications. The human protein phosphatase 
family has been well characterised experimentally, and detailed knowledge about 
different classes of such proteins is available. This knowledge is represented in the 
terminological part of the ontology. Moreover, a large set of human phosphatases 
has been identified and documented by expert biologists. These are described as 
individuals in the assertional part of the ontology. One can now ask whether the 
information about protein phosphatases contained in this ontology is complete: 
are all the relationships that hold among the introduced classes of phosphatases 
captured by the constraints in the TBox, or are there relationships that hold 
in the domain, but do not follow from the TBox? Are all possible kinds of 
human protein phosphatases represented by individuals in the ABox, or are 
there phosphatases that have not yet been included in the ontology or even not 
yet have been identified? 

Such questions cannot be answered by an automated tool alone. Clearly, 
to check whether a given relationship between concepts — which does not follow 
from the TBox — holds in the domain, one needs to ask a domain expert, and the 
same is true for questions regarding the existence of individuals not described in 
the ABox. The role of the automated tool is to ensure that the expert is asked as 
few questions as possible; in particular, she should not be asked trivial questions, 
i.e., questions that could actually be answered based on the represented knowl- 
edge. In the above example, answering a non-trivial question regarding human 
protein phosphatases may require the biologist to study the relevant literature, 
query existing protein databases, or even to carry out new experiments. Thus, 
the expert may be prompted to acquire new biological knowledge. 

The attribute exploration method of FCA has proved to be a successful 
knowledge acquisition method in various application domains. One of the earliest 
applications of this approach is described in [S5], where the domain is lattice 
theory, and the goal of the exploration process is to find, on the one hand, all 
valid relationships between properties of lattices (like being distributive) , and, 
on the other hand, to find counterexamples to all the relationships that do not 
hold. To answer a query whether a certain relationship holds, the lattice theory 
expert must either confirm the relationship (by using results from the literature 



or by carrying out a new proof for this fact), or give a counterexample (again, 
by either finding one in the literature or constructing a new one). 

Although this sounds very similar to what is needed in our case, we cannot 
directly use this approach. The main reason is the open-world semantics of de- 
scription logic knowledge bases. Consider an individual i from an ABox A and 
a concept C occurring in a TBox T. If we cannot deduce from the TBox T and 
A that i is an instance of C, then we do not assume that i does not belong to C. 
Instead, we only accept this as a consequence if T and A imply that i is an in- 
stance of -iC. Thus, our knowledge about the relationships between individuals 
and concepts is incomplete: if T and A imply neither C{i) nor -iC(j), then we 
do not know the relationship between i and C. In contrast, classical FCA and 
attribute exploration assume that the knowledge about objects is complete: a 
cross in row g and column m of a formal context says that object g has attribute 
m, and the absence of a cross is interpreted as saying that g does not have m. 

There has been some work on how to extend FCA and attribute exploration 
from complete knowledge to the case of partial knowledge [25|18|30|31|19|49] . 
and how to evaluate formulas in formal contexts that do not contain complete 
information [42]. However, these works are based on assumptions that are dif- 
ferent from ours. In particular, they assume that the expert cannot answer all 
queries and, as a consequence, the knowledge obtained after the exploration 
process may still be incomplete and the relationships between concepts that are 
produced in the end fall into two categories: relationships that are valid no mat- 
ter how the incomplete part of the knowledge is completed, and relationships 
that are valid only in some completions of the incomplete part of the knowledge. 
In contrast, our intention is to complete the knowledge base, i.e., in the end 
we want to have complete knowledge about these relationships. What may be 
incomplete is the description of individuals used during the exploration process. 

In [7153] we have introduced an extension of FCA that can deal with partial 
knowledge. This extension is based on the notion of a partial context that consists 
of a set of partial object descriptions (pod). A pod is a tuple (A, S) where A 
represents the set of attributes that the pod is known to have, and S represents 
the set of attributes that the pod is known not to have. A and S are disjoint and 
their union need not be the whole attribute set, i.e., for some attributes it might 
be unknown whether the pod has this attribute or not. We say that a pod {A, S) 
refutes an implication L —>■ Rii L C A and RnS ^ 0. We also say that a partial 
context refutes an implication if there is a pod in this partial context that refutes 
this implication. Based on these, we define the notion of an undecided implication, 
which is an implication that does not follow from a given set of implications, and 
that is not refuted by a partial context. Then the attribute exploration method 
for partial contexts can be formulated as enumerating undecided implications as 
efficient as possible. In j7|53j we have described a version of attribute exploration 
algorithm that works for this setting, and proved that this algorithm terminates 
and it is correct. Later we have shown that given a DL knowledge base (T, -4), 
any individual in A gives rise to a pod, and thus A induces a partial context. 
This enables us to use our attribute exploration algorithm on partial contexts for 
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Table 1. The partial context before completion 



finding completing DL knowledge bases. As a result of running this algorithm 
on a DL knowledge base, the knowledge base is complete w.r.t. an intended 
interpreation, i.e., if an implication holds in this interpreation then it also follows 
from the TBox, and if not then the ABox contains a counterexample to this 
implication. For details of the attribute exploration on partial contexts and its 
application to DL ontologies we refer the reader to |7I53) , and demonstrate on a 
small example how it works. 



Example 2. Let our TBox Tea 

AsianCountry ; 

EUmember ; 

EuropeanCountry ; 

GSmember ; 

IslandCountry ; 



ies contain the following concept definitions: 

Country Fl BhasTerritoryln. {Asia} 
Country n 3memberOf.{i?[/} 
Country Fl 3hasTerritoryln.{_Europe} 
Country n 3memberOf.{G8} 
Country Fl ^3hasTerritoryln. Continent 



MediterrenaenCountry = Country Fl ElhasBorderTo.jA/editerrenaenS'ea} 

Moreover, let our ABox Acountries contain the individuals Syria, Turkey, France, 
Germany, Switzerland, USA and assume we are interested in the subsumption 
relationships between the concept names AsianCountry, EUmember, European- 
Country, GSmember and MediterreneanCountry. Table[T]shows the partial context 
induced by Acountries , and Table [5] shows the questions asked by the completion 
algorithm and the answers given to these questions. In order to save space, the 
names of the concepts are shortened in both tables. The questions with positive 
answers result in extension of the TBox with the following GCIs: 

GSmember Fl MediterraneanCountry C EUmember Fl EuropeanCountry 
EUmember Fl GSmember C EuropeanCountry 
AsianCountry Fl EUmember C MediterraneanCountry 
AsianCountry Fl EUmember Fl 
EuropeanCountry Fl MediterraneanCountry C GSmember 



Moreover, the questions with negative answers result in extension of the 
ABox with the individuals Russia, Cyprus, Spain and Japan. The partial context 
induced by the resulting ABox A'^ountries is shown in Table [H The resulting 



Question 


Answer 


Counterex. 


{G8, Mediterranean} — >• {EU, European}? 


yes 


- 


{European, G8} -^ {EU}? 


no 


Russia 


{EU} -^ {European, G8}? 


no 


Cyprus 


{EU, G8} -^ {European}? 


yes 


- 


{EU, European} -^ {G8}? 


no 


Spain 


{Asian, G8} -^ {European}? 


no 


Japan 


{Asian, EU} — >■ {Mediterranean}? 


yes 


- 


{Asian, EU, European, Mediterranean} — >• {G8}? 


yes 


- 



Table 2. Execution of the ontology completion algorithm {% 
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Table 3. The partial context after completion 
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is complete w.r.t. the initially selected 



Based on on the described approach, we implemented a first experimental 
version of a DL knowledge base completion tool as an extension for the Swoop 
ontology editor using Pellet as the underlying reasoner. A first evaluation of this 
tool on the OWL ontology for human protein phosphatases with biologists as 
experts, was quite promising, but also showed that the tool must be improved in 
order to be useful in practice. In particular, we have observed that the experts 
sometimes make errors when answering queries. Thus, the tool should support 
the expert in detecting such errors, and also make it possible to correct errors 
without having to restart the completion process from scratch. Another usability 
issue on the wish list of our experts was to allow the postponement of answering 
certain questions, while continuing the completion process with other questions. 

In a follow-up paper |T3] we have addressed these usability issues. We have 
improved the method in such a way that at any time during completion the 
expert can pause the process, see all of her previous answers or changes to the 
knowledge base, 'undo' some of those changes, and continue completion. Here 



we of course paid attention that the expert does not have to answer the same 
questions she has answered before pausing the process. We have achieved this 
by saving previous answers, and using them as background knowledge when the 
expert continues completion. The other wish of our experts, namely postponing 
questions was solved pausing completion, changing the order of attributes, and 
restarting the completion with previous answers as background knowledge. In 
theory, this method might not postpone a question, thus the expert might be 
asked the last question again. However, in practice the method turned out to be 
useful in many cases when the expert was not able to answer a particular question 
and wanted to get another one. We have implemented our ontology completion 
method together with these usability issues as a plugin for the Protege ontology 
editor under the name OntoCompQ 



5 Conclusion 

We have summarized the work done in combining DLs and FCA. The research 
done in this field mainly falls under two categories: 1) efforts to enrich the lan- 
guage of FCA by borrowing constructors from DL languages, and 2) efforts to 
employ FCA methods in the solution of problems encountered in knowledge rep- 
resentation with DLs. For each of these categories we have given pointers and 
shortly described the relevant work in the literature. We have also described our 
own contributions, which are mainly under the second category. 

Recent developments in information technologies like social networks, Web 
2.0 applications and semantic web applications are bringing up new challenges 
for representing vast amounts of knowledge and analyzing huge amounts of data 
rapidly generated by these applications. The two research areas we have dis- 
cussed here, namely DLs and FCA, are lying at the core of representing knowl- 
edge, and analyzing data, respectively. We are confident that these new chal- 
lenges will enable new fruitful cooperations between these two research fields. 
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